Final cleaning and linking of AKB IDs to Yambol Features

The script explores the consistency of datatypes and feature attributes created in previous scripts, streamlines loose ends, and marries the records to their AKB counterparts. Final data are clipped to Yambol boundary and spatial duplicates are identified and filtering facilitated by adding logical campaign-indicating columns. This script can be run standalone and is final. There is no need to run previous scripts unless one wishes to explore the data in their raw state or change some of the processing.

What is AKB

AKB stands for the “Arheologicheska Karta na Bulgaria”, national digital register, where all archaeological sites and places of interest are entered upon their first investigation. The purpose of the present script is to add AKB identifiers assembled by Todor Valchev to the Yambol field observations. The biggest benefit of this exercise is that the AKB reference lets us see if a given mound was excavated and dated. It also allows users with access to AKB to look up the full cultural heritage information associated with the record.

Sites are often entered into AKB upon their first visit, with additional data added upon revisit, excavations, or other analyses.

The AKB link is one reason why you may want to filter the Yambol data for one or another spatial versions: Upto2010 and Post2010. While the Post2010 is based on later revisits and thus more recent and authoritative (especially if driven by the need to update or improve the record), the Upto2010 record IDs were use to create links in AKB and so are more suitable for mapping of cultural heritage data in AKB with the environmental variables present here.

Setup

library(tidyverse)
library(sf)
library(mapview)

Load data - master_sp_enriched

Here we load aggregated and enriched features from Yambol and wider surroundings.

# not deduplicated for final dataset
features <- st_read("../output_data/interim/master_sp_enriched.geojson")
Reading layer `master_sp_enriched' from data source 
  `C:\Users\Adela\Documents\RStudio\MoundMerging2023\output_data\interim\master_sp_enriched.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1484 features and 23 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N

Validation I: Check how many of the features are actually mounds

# Filter features by type
features %>%
    group_by(Type) %>%
    tally()
Simple feature collection with 9 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 9 × 3
  Type                      n                                           geometry
  <chr>                 <int>                                     <GEOMETRY [m]>
1 Burial Mound           1024 MULTIPOINT ((420515 4693580), (420707.1 4690851),…
2 Burial Mound?            91 MULTIPOINT ((422391.7 4698832), (429156.7 4695478…
3 Extinct Burial Mound    188 MULTIPOINT ((420832.5 4689836), (421875.6 4693594…
4 Extinct Burial Mound?    10 MULTIPOINT ((468291.5 4660440), (469551.2 4669012…
5 Other                    71 MULTIPOINT ((426968.4 4693384), (462171.2 4655463…
6 Surface Scatter          64 MULTIPOINT ((421045.4 4690265), (424207.4 4694969…
7 Tell                      3 MULTIPOINT ((465676.3 4675875), (467848.2 4662406…
8 Tell?                     1                           POINT (483400.7 4697154)
9 Uncertain Feature        32 MULTIPOINT ((437475.3 4690926), (445710.4 4699311…
# Verify the Source in the category 'Other' of Type.

features %>%
    filter(Type == "Other") %>%
    group_by(Source) %>%
    tally()
Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4700030
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 2 × 3
  Source                  n                                             geometry
  <chr>               <int>                                     <MULTIPOINT [m]>
1 Legacy verification    30 ((426968.4 4693384), (462171.2 4655463), (462343 46…
2 Survey                 41 ((462452.2 4662370), (463837.3 4664726), (463869.9 …

In the tally of “Other” types, there are 34 Legacy verification features and 41 Survey features in the later monitoring and 30 LgV and 41 Survey in the earlier monitoring. While the latter are expected, the 34 verified features required follow up on 27 Dec 2022. Inspection showed that many of the verified features originate not from sunbursts but other map markers, such as rayed squares and triangles. We inspected these in the early seasons but stopped doing so after these turned out not to lead to mounds reliably.

What is hiding under “other”?

features %>%
    filter(Type == "Other" & Source == "Legacy verification") %>%
    group_by(PrincipalSourceOfImpact) %>%
    tally()
Simple feature collection with 13 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4693384
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 13 × 3
   PrincipalSourceOfImpact                           n                  geometry
   <chr>                                         <int>            <GEOMETRY [m]>
 1 Agriculture                                       3 MULTIPOINT ((482322.1 46…
 2 Construction                                      5 MULTIPOINT ((467436.7 46…
 3 Erosion                                           2 MULTIPOINT ((462171.2 46…
 4 No observation                                    8 MULTIPOINT ((462907.7 46…
 5 Other                                             4 MULTIPOINT ((474493 4669…
 6 Other (Geodetic point )                           1  POINT (470009.1 4671929)
 7 Other (Military activity )                        1  POINT (471560.1 4669909)
 8 Other (Military construction and geodetic ma…     1  POINT (472794.2 4671237)
 9 Other (geodetic marker )                          1  POINT (469472.6 4669824)
10 Other (geodetic marker)                           1  POINT (463869.9 4659734)
11 Other (military activity and geodetic marker…     1    POINT (478912 4682704)
12 Post-depositional                                 1  POINT (426968.4 4693384)
13 Urban development                                 1  POINT (491526.2 4676393)

We can see that a lot of the map symbols led to geodetic points and benchmarks, 8 were not accessible, and other locations contained water wells and other structures, whose setting prevented us from labelling them as extinct mounds

Validation II: Check for attribute duplicates

Spatial duplication is addressed in 06_Enrich.Rmd

features$TRAP[duplicated(features$TRAP)]
numeric(0)

Validation III: Clean up condition, height and other attribute

Condition was documented on Likert scale from 1 to 5 with 1 being pristine and 5 being extinct. 0 denoted No observation.

unique(features$Condition)
[1] "3" "2" "4" "5" "1" "6" "0"
glimpse(features)
Rows: 1,484
Columns: 24
$ TopoID                  <dbl> 200013, 200016, 200018, 200019, 200020, 200022…
$ Note                    <chr> "Diadopetkova mogila", NA, NA, NA, NA, NA, NA,…
$ TRAP                    <dbl> 9305, 9411, 9416, 9417, 9409, 9314, 9315, 9415…
$ Source                  <chr> "Legacy verification", "Survey", "Legacy verif…
$ Type                    <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LU_Around               <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ LU_Top                  <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ Date                    <date> 2010-11-06, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax             <chr> "25", "15", "20", "15", "30", "30", "20", "25"…
$ HeightMax               <dbl> 1.8, 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1…
$ Condition               <chr> "3", "2", "3", "4", "2", "5", "2", "5", "2", "…
$ PrincipalSourceOfImpact <chr> NA, "Nodata", "Agriculture", "Looting", "Agric…
$ AllNotes                <chr> "stones of the top", "", "part of the north  s…
$ distBG                  <dbl> 68354.39, 61029.50, 62182.78, 62264.91, 61452.…
$ distTown                <dbl> 9730.7864, 1806.6579, 1303.8512, 1276.7345, 14…
$ distTownBoundary        <dbl> 635.3572, 1806.6579, 1303.8512, 1276.7345, 148…
$ ElevAster_m             <dbl> NA, 242.0523, 187.9270, 185.9483, 215.0000, 22…
$ SlopeAster_degrees      <dbl> NA, 3.527693e+00, 3.088841e+00, 3.120401e+00, …
$ AspectAster             <dbl> NA, 133.30900, 57.23311, 53.89560, 53.73102, 1…
$ TRI                     <dbl> NA, 1.29536438, 1.18449783, 1.00713158, 0.1759…
$ TPI                     <dbl> NA, 5.708199e-01, -2.686462e-01, -9.536743e-06…
$ Roughness               <dbl> NA, 4.6093445, 3.8489990, 4.0048981, 0.8009949…
$ prom250mbuff            <dbl> NA, 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 1…
$ geometry                <POINT [m]> POINT (441126.7 4710087), POINT (453727.…
features <- features %>%
    dplyr::mutate(Condition = case_when(Condition == 0 ~ "NA", Condition == 6 ~ "5",
        Condition != 0 ~ Condition)) %>%
    dplyr::mutate(Condition = as.factor(Condition)) %>%
    dplyr::mutate(TypeCertainty = case_when(grepl("\\?", Type) ~ "Uncertain", !grepl("\\?",
        Type) ~ "Certain")) %>%
    dplyr::mutate(Type = gsub("\\?", "", Type)) %>%
    dplyr::mutate(DiameterMax = as.numeric(DiameterMax))

levels(features$Condition) = c(1, 2, 3, 4, 5, NA)

features %>%
    group_by(TypeCertainty) %>%
    tally()
Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 2 × 3
  TypeCertainty     n                                                   geometry
  <chr>         <int>                                           <MULTIPOINT [m]>
1 Certain        1382 ((420515 4693580), (420707.1 4690851), (420832.5 4689836)…
2 Uncertain       102 ((422391.7 4698832), (429156.7 4695478), (429294.9 469564…

Constrain to Yambol Province

# load the boundary
Y_region <- st_read("../input_data/Vectors/YamRegion.shp")
Reading layer `YamRegion' from data source 
  `C:\Users\Adela\Documents\RStudio\MoundMerging2023\input_data\Vectors\YamRegion.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 431400.8 ymin: 4641684 xmax: 504849.3 ymax: 4727299
Projected CRS: WGS 84 / UTM zone 35N
Y_features <- st_intersection(features, Y_region$geometry)
Y_features  # 1243 features and 24 fields
Simple feature collection with 1260 features and 24 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type          LU_Around
2 200016 <NA> 9411              Survey Burial Mound              Scrub
3 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
4 200019 <NA> 9417 Legacy verification Burial Mound            Pasture
              LU_Top       Date DiameterMax HeightMax Condition
2              Scrub 2010-11-07          15       1.5         2
3 Annual agriculture 2010-11-07          20       2.0         3
4            Pasture 2010-11-07          15       2.0         4
  PrincipalSourceOfImpact
2                  Nodata
3             Agriculture
4                 Looting
                                                      AllNotes   distBG
2                                                              61029.50
3                       part of the north  slope was ploughted 62182.78
4 S part was mainly ploughted, 50 % of the mound was destroyed 62264.91
  distTown distTownBoundary ElevAster_m SlopeAster_degrees AspectAster      TRI
2 1806.658         1806.658    242.0523           3.527693   133.30900 1.295364
3 1303.851         1303.851    187.9270           3.088841    57.23311 1.184498
4 1276.735         1276.735    185.9483           3.120401    53.89560 1.007132
            TPI Roughness prom250mbuff TypeCertainty                 geometry
2  5.708199e-01  4.609344         7.17       Certain POINT (453727.1 4706737)
3 -2.686462e-01  3.848999        43.48       Certain POINT (452999.1 4707739)
4 -9.536743e-06  4.004898        12.04       Certain   POINT (452997 4707823)
 [ reached 'max' / getOption("max.print") -- omitted 7 rows ]

Within Yambol, we documented 1243 features (early observations).

Execute attribute changes following manual review

This chunk corrects poor data entries in the structured digital forms on the basis of photographs, diaries, and AKB records.

# Type Certainty UNCERTAIN TO CERTAIN
certain <- Y_features %>% 
  filter(TRAP %in% c(9838,9852,8766,8359)) %>% 
  mutate(TypeCertainty = "Certain" )

# Type Burial mound to EXTINCT
extinct <- Y_features %>% 
  filter(TRAP %in% c(9516, 8337, 8388, 8770, 9910, 9911)) %>% 
  mutate(Type = "Extinct Burial Mound" )

# Type Burial mound to Scatter
scatter <- Y_features %>% 
  filter(TRAP == 9883) %>% 
  mutate(Type = "Surface Scatter" )

# Type Burial mound to Other
other <- Y_features %>% 
  filter(TRAP %in% c(8427, 9645))%>% 
  mutate(Type = "Other")

# Swap Source in these two and TypeCertainty
m8763 <- Y_features %>% 
  filter(TRAP == 8763) %>% 
  mutate(Source = "Legacy verification")
m8764 <- Y_features %>% 
  filter(TRAP ==  8764) %>% 
  mutate(Source = "Survey", Type = "Extinct Burial Mound", TypeCertainty = "Uncertain")
  
fixes <- rbind(m8763,m8764, other,scatter,extinct,certain)

class(fixes$distBG)
[1] "numeric"
# Fix source in features with row_update()
colnames(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distBG"                 
[15] "distTown"                "distTownBoundary"       
[17] "ElevAster_m"             "SlopeAster_degrees"     
[19] "AspectAster"             "TRI"                    
[21] "TPI"                     "Roughness"              
[23] "prom250mbuff"            "TypeCertainty"          
[25] "geometry"               
colnames(fixes)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distBG"                 
[15] "distTown"                "distTownBoundary"       
[17] "ElevAster_m"             "SlopeAster_degrees"     
[19] "AspectAster"             "TRI"                    
[21] "TPI"                     "Roughness"              
[23] "prom250mbuff"            "TypeCertainty"          
[25] "geometry"               
# row_update() fails on the units distBG and geometry columns so we eliminate these for a moment to apply fixes, and then rejoin them.  

features_fixed <- rows_update(
  #remove distBG
  st_drop_geometry(Y_features)[,-14],  
  st_drop_geometry(fixes)[,-14],
  by = "TRAP")

# rejoin geometry and distBG to cleaned source and type data
features_fixed <- features_fixed %>% 
  left_join(Y_features[,c("TRAP", "distBG")], by = "TRAP") %>% 
  st_as_sf()

features_fixed %>% mapview(zcol = "Type")
features_fixed %>% 
  dplyr::filter(Type == "Tell") %>% 
  dplyr::select(Date, TRAP, Source)
Simple feature collection with 4 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 465676.3 ymin: 4662406 xmax: 483400.7 ymax: 4697154
Projected CRS: WGS 84 / UTM zone 35N
        Date TRAP              Source                 geometry
1 2018-09-20 9802              Survey POINT (483395.3 4665848)
2 2022-09-13 9825 Legacy verification POINT (467848.2 4662406)
3 2022-09-22 8777 Legacy verification POINT (483400.7 4697154)
4 2022-09-28 9914 Legacy verification POINT (465676.3 4675875)

Fix missing data notation

AllNotes field and TopoID field sometimes contain zeroes or are outright blank to indicate missing data. Missing data is legitimate, as only Map symbols get TopoID and many mounds were first encountered during Survey. Likewise, AllNotes field is aggregated from annotations in the mobile app, which served to comment on structured data (dropdown options), indicating some deviation or addition. They do not exist everywhere. Let’s convert all these zeroes and blanks to NAs for consistency.

features_fixed$AllNotes <- ifelse(features_fixed$AllNotes == "0" | features_fixed$AllNotes ==
    "", NA, features_fixed$AllNotes)
features_fixed$TopoID <- ifelse(features_fixed$TopoID == "0", NA, features_fixed$TopoID)

Clean up after the changes

rm(certain, extinct, fixes, m8763, m8764, other, scatter)

AKB identifiers

Here we add AKB numbers to feature data so their future analyses can be brought to bear on the environmental attributes documented during survey.

1055 AKB records

We have 1055 AKB numbers among 1466 visited features. The mismatch is because some features were not deemed worthy of registering (extinct or overbuilt status, military bunkers. etc)

AKB <- read_csv("../input_data/MoundsAKBnumbers.csv")
sum(!is.na(AKB$AKB))
[1] 1055
names(AKB)
[1] "TRAP"  "AKB"   "Area"  "Notes"
head(AKB)
# A tibble: 6 × 4
   TRAP     AKB Area     Notes                  
  <dbl>   <dbl> <chr>    <chr>                  
1  8048 2700001 Borisovo <NA>                   
2  9200 2700063 Yambol   <NA>                   
3  8356 2700064 Yambol   <NA>                   
4  8351 2700065 Yambol   second TRAP number 9227
5  9227 2700065 Yambol   second TRAP number 8351
6  8353 2700066 Yambol   second TRAP number 9257
# Which ones are excavated?
AKB %>%
    filter(grepl("[Ee]xcavated", Notes))
# A tibble: 29 × 4
    TRAP     AKB Area     Notes                        
   <dbl>   <dbl> <chr>    <chr>                        
 1  8047 2700149 Borisovo Excavated by Neli Tancheva   
 2  8043 2700151 Borisovo Excavated by Daniela Agre    
 3  8348 2700220 Mogila   Excavated by Piotr - Rome    
 4  8357 2700221 Mogila   Excavated by Volker EBA      
 5  8345 2700222 Mogila   Excavated by Ilia EBA        
 6  8349 2700226 Mogila   Excavated by Piotr - Rome    
 7  6009 2700287 Boyanovo Excavated by Ilia Iliev EBA  
 8  9357 2700289 Boyanovo Excavated by Daniela Agre EBA
 9  7004 2790017 Boyanovo Excavated by Ilia Iliev      
10  7005 2790018 Boyanovo Excavated by Daniela Agre    
# ℹ 19 more rows
# Which AKBs are duplicated?
AKB %>%
    filter(TRAP == 9962)
# A tibble: 2 × 4
   TRAP      AKB Area     Notes              
  <dbl>    <dbl> <chr>    <chr>              
1  9962 10009821 Pravdino <NA>               
2  9962 10009824 Pravdino Feature 2 on photos
which(AKB$TRAP == 9962)
[1] 965 968
# Eliminate duplicates
AKB <- AKB %>%
    slice(-968)

Join AKB to features

Y_features <- features_fixed %>%
    left_join(AKB, by = c("TRAP")) %>%
    rename(AKBNotes = Notes)

mapview(Y_features, zcol = "AKBNotes")
colnames(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distTown"               
[15] "distTownBoundary"        "ElevAster_m"            
[17] "SlopeAster_degrees"      "AspectAster"            
[19] "TRI"                     "TPI"                    
[21] "Roughness"               "prom250mbuff"           
[23] "TypeCertainty"           "distBG"                 
[25] "AKB"                     "Area"                   
[27] "AKBNotes"                "geometry"               
head(Y_features[, 10:27])
Simple feature collection with 6 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 452997 ymin: 4706447 xmax: 454596 ymax: 4707823
Projected CRS: WGS 84 / UTM zone 35N
  HeightMax Condition PrincipalSourceOfImpact
1       1.5         2                  Nodata
2       2.0         3             Agriculture
3       2.0         4                 Looting
                                                      AllNotes distTown
1                                                         <NA> 1806.658
2                       part of the north  slope was ploughted 1303.851
3 S part was mainly ploughted, 50 % of the mound was destroyed 1276.735
  distTownBoundary ElevAster_m SlopeAster_degrees AspectAster      TRI
1         1806.658    242.0523           3.527693   133.30900 1.295364
2         1303.851    187.9270           3.088841    57.23311 1.184498
3         1276.735    185.9483           3.120401    53.89560 1.007132
            TPI Roughness prom250mbuff TypeCertainty   distBG      AKB
1  5.708199e-01  4.609344         7.17       Certain 61029.50 10001492
2 -2.686462e-01  3.848999        43.48       Certain 62182.78 10001486
3 -9.536743e-06  4.004898        12.04       Certain 62264.91 10001487
             Area AKBNotes                 geometry
1        Drazhevo     <NA> POINT (453727.1 4706737)
2 Hadzhidimitrovo     <NA> POINT (452999.1 4707739)
3 Hadzhidimitrovo     <NA>   POINT (452997 4707823)
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
# Are there duplicate AKB numbers?
AKBduplicated <- Y_features %>%
    filter(!is.na(AKB)) %>%
    filter(duplicated(AKB)) %>%
    pull(AKB)

Y_features %>%
    dplyr::filter(AKB %in% AKBduplicated) %>%
    dplyr::select(TRAP, AKB)
Simple feature collection with 34 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 462153.1 ymin: 4661331 xmax: 492400 ymax: 4706973
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
   TRAP      AKB                 geometry
1  9222  2700226 POINT (464359.3 4706914)
2  9225 10007352 POINT (464401.3 4706968)
3  9226 10007350   POINT (464230 4706932)
4  9220  2700221 POINT (465196.3 4706385)
5  9258  2700067 POINT (462161.1 4705892)
6  9257  2700066 POINT (462153.1 4705829)
7  9216  2700216 POINT (467217.8 4706895)
8  9365 10001274 POINT (474347.1 4671482)
9  9223  2700220 POINT (464354.2 4706928)
10 9224 10007351 POINT (464404.7 4706951)

Beware: In 8 instances [10007351 10007352 2700220 2700226 10007031 10009627 10009305 10001274 10009811], AKB numbers are duplicated, meaning that 17 TRAP mounds share an AKB number.

How many features/mounds have been excavated in Yambol by 2023

Y_features %>%
    filter(AKB > 0) %>%
    filter(grepl("Excav", AKBNotes))  #  28 excavated in both datasets
Simple feature collection with 28 features and 27 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 447232.5 ymin: 4662878 xmax: 493995.2 ymax: 4715727
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type LU_Around  LU_Top
1 200029 <NA> 9431 Legacy verification Burial Mound   Pasture Pasture
2 200080 <NA> 9430 Legacy verification Burial Mound   Pasture Pasture
        Date DiameterMax HeightMax Condition PrincipalSourceOfImpact
1 2010-11-10          20         2         5                 Looting
2 2010-11-10          10        NA         5                 Looting
                                                         AllNotes  distTown
1                                           one big trench  10x10 1006.1010
2 one  big trench 9x6, totally destroyed, check excavation report  966.9691
  distTownBoundary ElevAster_m SlopeAster_degrees AspectAster      TRI
1        1006.1010    144.0196           3.487972   353.33450 1.409735
2         966.9691    147.0415           3.966855    28.08201 1.574440
        TPI Roughness prom250mbuff TypeCertainty   distBG      AKB     Area
1 0.7014713  4.790833        83.09       Certain 64932.45 10001377 Drazhevo
2 0.6184120  5.340210        29.14       Certain 64845.70 10001376 Drazhevo
                 AKBNotes                 geometry
1 Excavated by Ilia Iliev POINT (455348.3 4711125)
2 Excavated by Ilia Iliev   POINT (455447 4711057)
 [ reached 'max' / getOption("max.print") -- omitted 8 rows ]
Y_features %>%
    filter(AKB > 0)  # 1030/1040 early/later features have AKB >> duplicates must be here
Simple feature collection with 1047 features and 27 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type          LU_Around
1 200016 <NA> 9411              Survey Burial Mound              Scrub
2 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
              LU_Top       Date DiameterMax HeightMax Condition
1              Scrub 2010-11-07          15       1.5         2
2 Annual agriculture 2010-11-07          20       2.0         3
  PrincipalSourceOfImpact                               AllNotes distTown
1                  Nodata                                   <NA> 1806.658
2             Agriculture part of the north  slope was ploughted 1303.851
  distTownBoundary ElevAster_m SlopeAster_degrees AspectAster      TRI
1         1806.658    242.0523           3.527693   133.30900 1.295364
2         1303.851    187.9270           3.088841    57.23311 1.184498
         TPI Roughness prom250mbuff TypeCertainty   distBG      AKB
1  0.5708199  4.609344         7.17       Certain 61029.50 10001492
2 -0.2686462  3.848999        43.48       Certain 62182.78 10001486
             Area AKBNotes                 geometry
1        Drazhevo     <NA> POINT (453727.1 4706737)
2 Hadzhidimitrovo     <NA> POINT (452999.1 4707739)
 [ reached 'max' / getOption("max.print") -- omitted 8 rows ]

Fix column names for legibility

names(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distTown"               
[15] "distTownBoundary"        "ElevAster_m"            
[17] "SlopeAster_degrees"      "AspectAster"            
[19] "TRI"                     "TPI"                    
[21] "Roughness"               "prom250mbuff"           
[23] "TypeCertainty"           "distBG"                 
[25] "AKB"                     "Area"                   
[27] "AKBNotes"                "geometry"               
Y_features <- Y_features %>%
    rename(TopoNote = Note, LanduseAround = LU_Around, LanduseOnTop = LU_Top, DiameterMax_m = DiameterMax,
        HeightMax_m = HeightMax, DistTown_m = distTown, DistTownBoundary_m = distTownBoundary,
        Prominence_at250m = prom250mbuff, DistBG_m = distBG, AKBArea = Area)

Mark duplicates

A number of features were visited multiple times and registered under different numbers. Spatial duplicates and triplicates are mounds that share the same location (often are within 15 m of one another) but have unique ID.

We have two lists of matching TRAP IDs for 2009-2010 mounds and their post-2010 duplicates.

# Early records
upto2010 <- c(6011, 8022:8025, 8028, 8029, 8030, 8035, 8350:8353, 8357, 8359, 8434,
    8669, 9077)

# Later records
post2010 <- c(9357, 9594, 9595, 9593, 9596, 9592, 9591, 9358, 8202, 9226, 9227, 9258,
    9257, 9220, 9219, 9216, 9740, 9715)

# to see the pairs, they are collated in output data folder
duplicates <- read.csv("../output_data/interim/duplicates_final.txt", sep = " ")

In order to highlight the duplicates and allow users to filter the early or later variant (2009-2010 or post-2010), we create a SpatialDuplicate column, declare the paired TRAP ID in the PairedID column, and create a Pre2010Variant and Post2010Variant columns to facilitate filtering of the early and later version.

# Label the spatial versions/duplicates as such
Y_features$SpatialDuplicate <- ifelse(Y_features$TRAP %in% duplicates$upto2010 |
    Y_features$TRAP %in% duplicates$post2010, "Duplicate", "Unique")

Add PairedID column and fill in with other version TRAP id where applicable

# Initialize PairedID column with NA
Y_features$PairedID <- NA

# For upto2010 records, use the TRAP id in post2010 column of duplicates
Y_features$PairedID[Y_features$TRAP %in% duplicates$upto2010] <- duplicates$post2010[match(Y_features$TRAP[Y_features$TRAP %in%
    duplicates$upto2010], duplicates$upto2010)]

# For post2010 records, use TRAP id from upto2010 from duplicates
Y_features$PairedID[Y_features$TRAP %in% duplicates$post2010] <- duplicates$upto2010[match(Y_features$TRAP[Y_features$TRAP %in%
    duplicates$post2010], duplicates$post2010)]

Y_features %>%
    st_drop_geometry() %>%
    dplyr::select(TRAP, PairedID, SpatialDuplicate) %>%
    filter(SpatialDuplicate == "Duplicate")
   TRAP PairedID SpatialDuplicate
1  9226     8350        Duplicate
2  9220     8357        Duplicate
3  9258     8352        Duplicate
4  9257     8353        Duplicate
5  9216     8434        Duplicate
6  9358     8030        Duplicate
7  9227     8351        Duplicate
8  9219     8359        Duplicate
9  9357     6011        Duplicate
10 6011     9357        Duplicate
11 8035     8202        Duplicate
12 8022     9594        Duplicate
13 8024     9593        Duplicate
14 8023     9595        Duplicate
15 8025     9596        Duplicate
16 8028     9592        Duplicate
17 8029     9591        Duplicate
18 8030     9358        Duplicate
19 8350     9226        Duplicate
20 8351     9227        Duplicate
21 8352     9258        Duplicate
22 8353     9257        Duplicate
23 8357     9220        Duplicate
24 8359     9219        Duplicate
25 9591     8029        Duplicate
 [ reached 'max' / getOption("max.print") -- omitted 10 rows ]

Create two columns Upto2010 and Post2010 to facilitate filtering by different versions.

`%nin%` <- Negate(`%in%`)
Y_features$Upto2010 <- ifelse(Y_features$TRAP %nin% duplicates$post2010, "Yes", "No")
Y_features$Post2010 <- ifelse(Y_features$TRAP %nin% duplicates$upto2010, "Yes", "No")

# review
Y_features %>%
    st_drop_geometry() %>%
    dplyr::select(TRAP, PairedID, SpatialDuplicate, Upto2010, Post2010) %>%
    filter(SpatialDuplicate == "Duplicate")
   TRAP PairedID SpatialDuplicate Upto2010 Post2010
1  9226     8350        Duplicate       No      Yes
2  9220     8357        Duplicate       No      Yes
3  9258     8352        Duplicate       No      Yes
4  9257     8353        Duplicate       No      Yes
5  9216     8434        Duplicate       No      Yes
6  9358     8030        Duplicate       No      Yes
7  9227     8351        Duplicate       No      Yes
8  9219     8359        Duplicate       No      Yes
9  9357     6011        Duplicate       No      Yes
10 6011     9357        Duplicate      Yes       No
11 8035     8202        Duplicate      Yes       No
12 8022     9594        Duplicate      Yes       No
13 8024     9593        Duplicate      Yes       No
14 8023     9595        Duplicate      Yes       No
15 8025     9596        Duplicate      Yes       No
 [ reached 'max' / getOption("max.print") -- omitted 20 rows ]

Export Yambol features

To export these properly, select your desired version to run the script with either the early or the later dataset. Also, mke sure to select the best format for your further processing: rds(R) or geojson(Python). Default is the _early version and rds format.

# Features in Yambol
names(Y_features)
 [1] "TopoID"                  "TopoNote"               
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LanduseAround"          
 [7] "LanduseOnTop"            "Date"                   
 [9] "DiameterMax_m"           "HeightMax_m"            
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "DistTown_m"             
[15] "DistTownBoundary_m"      "ElevAster_m"            
[17] "SlopeAster_degrees"      "AspectAster"            
[19] "TRI"                     "TPI"                    
[21] "Roughness"               "Prominence_at250m"      
[23] "TypeCertainty"           "DistBG_m"               
[25] "AKB"                     "AKBArea"                
[27] "AKBNotes"                "geometry"               
[29] "SpatialDuplicate"        "PairedID"               
[31] "Upto2010"                "Post2010"               
glimpse(Y_features)
Rows: 1,260
Columns: 32
$ TopoID                  <dbl> 200016, 200018, 200019, 200020, 200022, 200023…
$ TopoNote                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ TRAP                    <dbl> 9411, 9416, 9417, 9409, 9314, 9315, 9415, 9418…
$ Source                  <chr> "Survey", "Legacy verification", "Legacy verif…
$ Type                    <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LanduseAround           <chr> "Scrub", "Annual agriculture", "Pasture", "Ann…
$ LanduseOnTop            <chr> "Scrub", "Annual agriculture", "Pasture", "Pas…
$ Date                    <date> 2010-11-07, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax_m           <dbl> 15, 20, 15, 30, 30, 20, 25, 50, 15, 20, 10, 23…
$ HeightMax_m             <dbl> 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1.0, 2…
$ Condition               <fct> 2, 3, 4, 2, 5, 2, 5, 2, 3, 5, 5, 5, 5, 5, 1, 5…
$ PrincipalSourceOfImpact <chr> "Nodata", "Agriculture", "Looting", "Agricultu…
$ AllNotes                <chr> NA, "part of the north  slope was ploughted", …
$ DistTown_m              <dbl> 1806.65789, 1303.85117, 1276.73451, 1486.64146…
$ DistTownBoundary_m      <dbl> 1806.65789, 1303.85117, 1276.73451, 1486.64146…
$ ElevAster_m             <dbl> 242.0523, 187.9270, 185.9483, 215.0000, 227.64…
$ SlopeAster_degrees      <dbl> 3.5276933, 3.0888411, 3.1204007, 0.5485047, 3.…
$ AspectAster             <dbl> 133.3090024, 57.2331062, 53.8955957, 53.731018…
$ TRI                     <dbl> 1.2953644, 1.1844978, 1.0071316, 0.1759815, 1.…
$ TPI                     <dbl> 5.708199e-01, -2.686462e-01, -9.536743e-06, 1.…
$ Roughness               <dbl> 4.6093445, 3.8489990, 4.0048981, 0.8009949, 4.…
$ Prominence_at250m       <dbl> 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 17.58…
$ TypeCertainty           <chr> "Certain", "Certain", "Certain", "Certain", "C…
$ DistBG_m                <dbl> 61029.50, 62182.78, 62264.91, 61452.66, 60782.…
$ AKB                     <dbl> 10001492, 10001486, 10001487, 10001371, 100014…
$ AKBArea                 <chr> "Drazhevo", "Hadzhidimitrovo", "Hadzhidimitrov…
$ AKBNotes                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Excavated…
$ geometry                <POINT [m]> POINT (453727.1 4706737), POINT (452999.…
$ SpatialDuplicate        <chr> "Unique", "Unique", "Unique", "Unique", "Uniqu…
$ PairedID                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Upto2010                <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes…
$ Post2010                <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes…
# Export
Y_features %>%
    write_rds("../output_data/Y_features.rds")
Y_features %>%
    st_write("../output_data/Y_features.geojson", append = F)
Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_features' failed
Writing layer `Y_features' to data source 
  `../output_data/Y_features.geojson' using driver `GeoJSON'
Updating existing layer Y_features
Writing 1260 features with 31 fields and geometry type Point.
Y_features %>%
    mutate(X = st_coordinates(.)[, 2], Y = st_coordinates(.)[, 1]) %>%
    mutate(DistBG_m = as.vector(unclass(DistBG_m)), DistTown_m = unclass(DistTown_m),
        DistTownBoundary_m = unclass(DistTownBoundary_m)) %>%
    st_drop_geometry() %>%
    write_csv("../output_data/Y_features.csv")

Filter mounds

Now that the attributes look reasonably well, let’s filter out and export the most likely mounds inside the Yambol Province.

Y_features %>%
    filter(grepl("Mound|Other|Uncertain", Type)) %>%
    group_by(Type) %>%
    tally()
Simple feature collection with 4 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 4 × 3
  Type                     n                                            geometry
  <chr>                <int>                                    <MULTIPOINT [m]>
1 Burial Mound           927 ((433010.4 4694365), (433295.2 4693180), (433954.3…
2 Extinct Burial Mound   168 ((433034.9 4693186), (433441.6 4692413), (435565.5…
3 Other                   72 ((445501 4697842), (462171.2 4655463), (462313 468…
4 Uncertain Feature       31 ((445710.4 4699311), (457010.8 4671808), (462076.8…

Export Yambol mounds

Moving to export, .rds, .geojson, and .csv are generated below. For reuse, remember to filter for either the Upto2010 or Post2010 variant. The latter is perhaps more authoritative as the later observations are more “current”. But there is good reason to use the former variant if you need to add AKB information or both variants together if you wish to look for change in observations between teams and seasons.

Y_features %>%
    filter(grepl("[Mm]ound", Type)) %>%
    write_rds("../output_data/Y_mounds.rds")

Y_features %>%
    filter(grepl("[Mm]ound", Type)) %>%
    st_write("../output_data/Y_mounds.geojson", append = FALSE)
Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_mounds' failed
Writing layer `Y_mounds' to data source 
  `../output_data/Y_mounds.geojson' using driver `GeoJSON'
Updating existing layer Y_mounds
Writing 1095 features with 31 fields and geometry type Point.
Y_features %>%
    filter(grepl("[Mm]ound", Type)) %>%
    mutate(X = st_coordinates(.)[, 2], Y = st_coordinates(.)[, 1]) %>%
    mutate(DistBG_m = as.vector(unclass(DistBG_m)), DistTown_m = unclass(DistTown_m),
        DistTownBoundary_m = unclass(DistTownBoundary_m)) %>%
    st_drop_geometry() %>%
    write_csv("../output_data/Y_mounds.csv")